Global Pairwise Sequence Alignments with Multiple Scoring Matrices
نویسندگان
چکیده
A necessity of repeating alignments on a same pair of sequences to select an appropriate scoring function to detect a significance of score motivates us to work on higher performance matching technique. While overlapping each alignments obtained through a set of typical scoring matrix with its default gap cost, we observe significant parameters that may induce a maximum deviation from a reference global alignment. We propose a percent identity and a wideness of the alignment as a potential candidate of parameters. We sample pairs of protein sequences from ASTRAL database to generate the corresponding probability distribution with respect to the proposed parameters. When biologists conduct trial-and-error global matching procedure which is based on highly cost dynamic programming, our ‘static’ searching selection scheme can be 12 times faster than a normal alignment algorithm when repeating alignments with 3 scoring matrices, having a reasonably high accuracy of alignment between 85% and 95%. keywords: global sequence alignment, multiple scoring matrices
منابع مشابه
Global Multiple Sequence Alignment
As with pairwise alignment, multiple sequence alignments (MSAs) are typically scored by assigning a score to each column and summing over the columns. The most common approach to scoring individual columns in a multiple alignment is to calculate a score for each pair of symbols in the column, and then sum over the pair scores. This is called sum-of-pairs or SP-scoring. For global multiple seque...
متن کاملEstimating Pairwise Statistical Significance of Protein Local Alignments Using a Clustering-Classification Approach Based on Amino Acid Composition
A central question in pairwise sequence comparison is assessing the statistical significance of the alignment. The alignment score distribution is known to follow an extreme value distribution with analytically calculable parameters K and λ for ungapped alignments with one substitution matrix. But no statistical theory is currently available for the gapped case and for alignments using multiple...
متن کاملScore distributions of gapped multiple sequence alignments down to the low-probability tail.
Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution ...
متن کاملOn the significance of sequence alignments when using multiple scoring matrices
MOTIVATION Pairwise local sequence alignment is commonly used to search data bases for sequences related to some query sequence. Alignments are obtained using a scoring matrix that takes into account the different frequencies of occurrence of the various types of amino acid substitutions. Software like BLAST provides the user with a set of scoring matrices available to choose from, and in the l...
متن کاملThree-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins.
Three-dimensional cluster analysis offers a method for the prediction of functional residue clusters in proteins. This method requires a representative structure and a multiple sequence alignment as input data. Individual residues are represented in terms of regional alignments that reflect both their structural environment and their evolutionary variation, as defined by the alignment of homolo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006